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Abstract 

We use the Falcone-Takesaki non-commutative flow of weights and the resulting the- 
ory of non-commutative L p spaces in order to define the family of relative entropy 
functionals that naturally generalise the quantum relative entropies of Jencova- 
Ojima and the classical relative entropies of Zhu-Rohwer, and belong to an in- 
tersection of families of Petz relative entropies with Bregman relative entropies. 
For the purpose of this task, we generalise the notion of Bregman entropy to the 
infinite-dimensional non-commutative case using the Legendre-Fenchel duality. In 
addition, we use the Falcone-Takesaki duality to extend the duality between coarse- 
grainings and Markov maps to the infinite-dimensional non-commutative case. Fol- 
lowing the recent result of Amari for the Zhu-Rohwer entropies, we conjecture that 
the proposed family of relative entropies is uniquely characterised by the Markov 
monotonicity and the Legendre-Fenchel duality. The role of these results in the 
foundations and applications of quantum information theory is discussed. 

dedicated to Professor Roman S. Ingarden on the occasion of his 90^ birthday 

1 Introduction 

The information theory can be separated into two parts: information kinematics (de- 
scribed in terms of information geometry theory) and information dynamics (described 
in terms of inductive inference theory). Information kinematics incorporates the descrip- 
tion of convex and smooth geometric structures on the spaces of information states, while 
information dynamics incorporates the methods of statistical inference such as Bayes' 
rule, maximum likelihood, and constrained relative entropic updating. Both parts of 
information theory share two underlying mathematical notions: information model and 
information deviation (relative entropy). Following the approach initiated by Ingarden et 
al [73] and Eguchi [44, 45] (and continued in particular in [99, 69, 135, 83]), we think that 
the further structures of the information geometry theory, such as riemannian metrics or 
affine connections on information models, should follow from the additional requirements 
imposed on these two notions. However, this requires us to define information models 
and information deviations on the level of generality that covers all regimes of informa- 
tion geometry, ranging from the finite-dimensional commutative normalised case to the 
infinite-dimensional non-commutative non-normalised case. 

We define an information model as a subset M. of the space of all positive, 
finite, normal, linear C-valued functions on a given commutative or non-commutative 
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W / *-algebra A/". 2 Elements of A/"+ will be called states (or information states). If the 
additional normalisation condition is assmed, then the resulting space A/"^ C J\T* con- 
sists of all normal algebraic states on M . For commutative A/", reduces to a space 
Li(l5) of Daniell-Stone integrals on a Banach vector lattice of characteristic functions on 
Stone spectrum of a countably finite Dedekind complete maharanisable algebra O that 
is canonically associated with M = Loo(U). In this case A/"* + reduces to a space Li(l3)i 
of all normal (hence, monotonically continuous) expectation functionals on A/", which 
bijectively corresponds to the space of all positive probability measures on O. 

In the commutative and normalised case Af^~ reduces to a space of all normal (hence, 
monotonically continuous) expectation functionals on A/", which is a space of Daniell- 
Stone integrals on a Banach vector lattice that bijectively corresponds to the space V of 
all positive probability measures on a countably finite Dedekind complete boolean algebra 
U that is derived from M. 

An information deviation (also referred to as: negative relative entropy, information 
divergence, information gain, loss function, risk function, contrast functional) is defined 
as a map D : x A"+ — > [0, oo] such that D(u, <fi) — iff oj — (p. It plays a role of a non- 
symmetric distance functional on A/"* + , and serves as a principal tool for quantification of 
relative information content of states. 

The above definitions of information model and information deviation are suitable also 
for the purposes of information dynamics theory (see [91, 92]). The main goal of informa- 
tion theory is to provide particular instances of inductive inference (information dynam- 
ics), so the derivation of the particular form of deviation functional from the requirements 
imposed on the information dynamics is an important task. This method of axiomati- 
sation was advanced in particular in [143, 144, 85, 159, 160, 145, 113, 114, 37, 156, 23]. 
However, we would like to keep the kinematic and dynamic properties of information the- 
ory separated for a while, allowing for a better understanding of information geometric 
kinematics on its own right. Hence, in this work we will consider only the information 
kinematics and, more specifically, the problem of unique selection of a particular form of 
information deviation functional by the mathematical conditions that are plausible from 
the perspective of information kinematic applications. There exist also many axiomatisa- 
tions of information deviations provided this way (see e.g., [1, 36, 42, 121, 111, 123, 38, 4]). 

The virtue of (kinematic or dynamic) axiomatic characterisation is that it allows to 

2 An additional subscript w i will be used to denote the subset of normalised (w(I) = 1) elements, while 
an additional subscript will be used to denote the subset of faithful (uj(x*x) = =>■ x = Vx € Af) 
elements. A function uj : B — >• K. where B is a Banach space and K 6 {M,C}, is called normal 
iff w(sup J-) = sup xG jrUj(x) for each directed filter J- C B with the upper bound sup J 1 ". The weak- 
* topology on Banach dual £> B of B is defined as the weakest topology on B B such that the linear 
functions £> B 3 ui i— > ui{x) € K are continuous for every x £ B. A K-linear function on B is normal 
iff it is continuous in weak-* topology. A C*-algebra is a Banach space A over K such that it is 
also a ring over K with unit I and operation * defined by (ab)* := b*a* , (Xia + Xb)* := X\a* + X^b* 
(where * : K — >• K denotes the complex conjugation), such that \\xy\\ < \\x\\\\y\\, ||I|| = 1, \\x*x\\ = \\x\\ 
Vx,y € A. A W* -algebra is a C*-algebra N such that there exists a Banach space AC, called predual, 
satisfying A^ B = Af. The set of normal linear K-valued (hence, weak-* continuous) functions on Af is 
precisely a predual of Af. A weight on a W^*-algebra Af is a function w : Af + — > [0, +oo] such that 
w(0) = 0, ui(x + y) = w(x) + uj(y), A > => uj(Xx) = Xuj(x). A weight uj is called semi-finite iff 
Vx € Af + 3y E Af + (x > y, y ^ 0, ui(y) < +oo). A weight is called a trace iff uj(u*xu) — ui(x) for all 
unitary u € Af. The space of all semi-finite normal weights on Af will be denoted W(Af). 
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equip the characterised object with an unambiguous justification that is grounded in 
the particular interpretation of the elementary axioms. Yet, a single mathematical ax- 
iomatisation can be equipped with arbitrary many conceptual interpretations, hence the 
relevance of this point can be questioned. Nevertheless, for the purposes of applica- 
tions of the information theory, at least some minimal purely operational interpretation 
is required anyway, hence the axiomatic characterisation of the information geometric 
(and information dynamic) structures is important. The variety of different applications 
and axiomatisations suggests strongly that there is no 'absolute' or 'universal' method 
of model construction, as well as there is no 'universal' choice of information deviation 
and no 'universal' choice of a method of inductive inference. In consequence, the aim 
of mathematical inquiry is to identify various sets of conditions such that each set is 
necessary and sufficient to select some particular category of models and some particular 
information deviation (as well as some particular inductive inference rule). While the 
statistical inference (dynamics) and information geometric modelling (kinematics) based 
on different sets of conditions is incommensurable, each set on its own is a valid universe 
for information theoretic (statistical, inferential, probabilistic, quantum) inquiry. 

This paper aims at the problem of identification of a set of conditions underlying 
the particular information geometry theory that was originally developed along the lines 
of requirement of monotonicity of information geometric structures (deviations, met- 
rics, connections) under the Markov morphisms of information models. The results of 
Chentsov [25, 26] showed that the constraint of monotonicity under Markov maps is strong 
enough to provide interesting characterisations of unique (Fisher-Rao [53, 130, 14, 79]) 
riemannian metrics and one-parameter family of ( Chentsov- Amari [25, 26, 2]) affine con- 
nections in normalised finite-dimensional commutative case. Due to work of Morozova 
and Chentsov [105] and final result of Petz [122, 124], a family of riemannian metrics in 
normalised finite-dimensional non-commutative case was characterised by the same condi- 
tion (an explicit integral representation of the Morozova-Chentsov-Petz family was given 
by Hansen [67]). However, further results showed that monotonicity under Markov maps 
is insufficient for fine analysis of geometries of quantum states in non-commutative case, 
even in finite-dimensional normalised regime. This follows from the results of Jencova 
[80, 82, 83] characterising the family of markovian monotone connections, as well as 
related results of Hasegawa and Petz [127, 70, 69], Grasselli and Streater [60, 59, 58] 
and Gibilisco and Isola [55, 56, 57] on characterisation of a subfamily of the Morozova- 
Chentsov-Petz metrics known as the Wigner-Yanase-Dyson metrics [164, 68], together 
with its boundary known as the Bogolyubov-Kubo-Mori metric [15, 93, 103]. This leads 
to a question about the additional principle of structural properties which would enable 
an improved characterisation. 

As already remarked, Ingarden et al [73] and Eguchi [44, 45] observed that the struc- 
tures of riemannian metric and affine connections on information manifold can be de- 
rived from information deviation functional by suitable differentiation of the arguments 
and passing to the limit with one argument converging to other. In particular, the 
Fisher-Rao metric and the family of Chentsov-Amari connections can be derived in 
finite-dimensional commutative case by differentiation of the family D p of Zhu-Rohwer 
deviations [171, 172, 173]. This approach was continued also in the finite-dimensional 
non-commutative case [99, 69, 83]. This leads us to wonder whether (and which of) 
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the properties of information deviation could be used to characterise the structures of 
information geometry. Amari [4] recently showed that the Zhu-Rohwer deviations are 
characterised (under some mild auxiliary conditions) as an intersection of the families of 
Csiszar-Morimoto [34, 35, 104] and Bregman [19] deviations. While the former family 
is characterised [36, 37] by monotonicity under Markov maps, the latter is characterised 
[86, 37] by additive decomposition under projection on the subset that is 'orthogonal' to 
the projection in the sense of Legendre duality. 

This leads us to consider duality as a principle of characterisation that is equally 
fundamental as markovian monotonicity. However, as opposed to perspective present in 
standard treatments such as [107, 44, 45, 98, 97, 5, 169], we require also that the dual- 
istic properties should be independent of differential properties, because the former are 
essentially related to variational component of the theory, which in infinite-dimensional 
setting becomes quite independent of the smooth component of the theory. Consideration 
of non-smooth infinite-dimensional duality as a principle that is as important as Markov 
monotonicity leads us to deny the foundational role of differential geometric Norden-Sen 
duality [109, 142] in favour of variational convex Legendre-Fenchel duality [51, 20]. Turn- 
ing this idea into structure, we define the generalised Bregman deviation in terms of the 
convex Legendre-Fenchel duality on the dualised vector spaces, with no differentiability 
and no topological continuity properties required a priori. 

Jencova [81, 84] and Ojima [112] introduced a family D p of quantum deviation func- 
tional on A/"/, defined using the dual structure of non-commutative Araki-Masuda 
L p (J\f,(j)) spaces [10, 101]. These deviations are well-defined in the non-commutative 
infinite-dimensional case, and belong to an intersection of the families of generalised 
Bregman deviations and Markov monotone Petz deviations [119, 120] (the latter are non- 
commutative generalisation of the Csiszar-Morimoto deviations). In particular cases, the 
Jencova-Ojima deviations reduce to relative entropies of Hasegawa [68] and Umegaki- 
Araki [157, 7, 8]. However, the Araki-Masuda L p (J\f,(ft) space is not a canonical non- 
commutative L p space construction, due to its dependence on an arbitrary weight <p. In 
consequence, the Jencova-Ojima deviation is not a canonical non-commutative generali- 
sation of the Zhu-Rohwer deviation, because the construction of the former depends on 
the choice of reference weight 0, while the latter is independent of any choice of this kind. 

We propose a solution to this problem, which is based on the Falcone-Takesaki theory 
[48] of non-commutative flow of weights and associated construction of non-commutative 
L p (N) spaces that does not depend on a choice of reference weight. Using its remarkable 
properties, we define the family D p of information deviation (relative entropy) function- 
als which is the canonical generalisation of Jencova-Ojima and Zhu-Rohwer families of 
deviations. We show that our family D p of information deviations belongs not only to 
Petz's class of Markov monotone deviations, but also to the class of generalised Bregman 
deviations. As a non-commutative counterpart to the result of Amari [4], we conjec- 
ture that the family of D p deviations is uniquely characterised as intersection of Markov 
monotone deviations with generalised Bregman deviations. This requirement states that 
quantum information deviations should be monotone under (Markov) coarse-graining of 
information and should allow additive decompositions under (Legendre-Fenchel) orthog- 
onal projections on closed convex afline information subspaces. 

This way the family D p of information deviations on Ai C A^ + becomes equipped 
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with an information theoretic interpretation, and the same is true for further information 
geometric structures on A4 derived from these deviations, such as riemannian metrics 
and affine connections. This interpretation refers to the notion of information as a fun- 
damental entity, and is not involved in the conceptual problems associated with the 
interpretations of probability and the interpretations of Hilbert space based quantum 
theory. 

From the perspective of applications, this allows to discuss the algebraic models of 
quantum statistical mechanics and quantum field theory as particular cases of quantum 
information theory. The use of the Falcone-Takesaki theory and the Legendre-Fenchel 
duality advocated in this paper is a key novel technique that allows to derive various 
new results in the infinite-dimensional non-commutative case, where neither the suitable 
differentiability properties nor the description in terms of density operators is available. 
From the perspective of foundations, this allows to provide a major change of perspective: 
instead of separate consideration of statistical theory and quantum theory, with their 
separate problems of model construction, one obtains a single framework covering all 
theories at once. The model construction techniques based on spectral theory can be 
replaced by the model construction techniques based on quantum information geometry 
(which includes the spectral theory as a special case), with the family D p of information 
deviations playing the central role (and replacing the role played by the Hilbert space 
norm) . 

2 Information models 

2.1 Non-commutative flow of weights 

In what follows, we will often identify the W'-algebra Af with its standard representation 
von Neumann algebra ir(AT) on the standard representation Hilbert space H [63]. Every 
jy*-algebra Af has a unique standard representation, up to unitary isomorphism. This 
representation is faithful (ker(7r) = {0}) and is unitarily isomorphic with the Gel'fand- 
Naimark-Segal representation [138] for a pair (Af,u), whenever u> G A^o or w G Wo (A/"). 
While the set A/*o * s non-empty iff Af is countably finite (i.e., it is isomorphic to a von 
Neumann algebra possessing a cyclic and separating vector), Wo (A/") 7^ for every W*- 
algebra Af. 

Two crucial elements of the Falcone-Takesaki theory [48] are the core von Neumann 
algebra Af associated functorially to any von Neumann algebra Af, and the resulting 
canonical construction of non-commutative L p (Af) spaces, which is independent of any 
reference weight or state. 

For x,x' G Af, <jJ,u)' G Wo (A/"), and (^?) t denoting Connes' cocycle [28] (which is 
a generalisation of the Radon-Nikodym derivative to the non-commutative case), one 
defines the equivalence relation ~ 4 on Af x Wo (A/") by 



The equivalence class (Af x Wo (Af))/ ~t is denoted by Af(t), and its elements are denoted 
by xu lt . The definitions (xuj lt + yu lt ) := (x + y)u> lt , X(xu lt ) := (\x)oj lt for A G C, and 
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|xa; l *| := ||x|| equip Af(t) with the structure of the Banach space, which is isometrically 
isomorphic to the Banach space structure of Af '. By definition, jV(0) is isomorphic to Af 
also as a von Neumann algebra. The product topology onA^xK (with Af endowed with 
the weak-* topology or Arens-Mackey topology, but not with norm topology) allows to 
use the bijections Af(t) 3 Xip lt i— >■ (x, t) G Af x R to form Fell's Banach *-algebra bundle 
F(Af) := YLt&M.N(t) over TVxIR [49, 50]. The core von Neumann algebra Af is constructed 
from Af as a result of action of the Banach *-algebra of suitable sections of F(Af) on a 
suitably defined auxiliary Hilbert space (see [48] for details). The construction of Af does 
not depend on any weight Wo {Af). However, for every choice of a particular u, there 
exists a unitary map providing the isomorphism Af = Af x a w R, where cr w (-) = A l ^(-)A~ lt 
denotes the Tomita-Takesaki modular automorphism of Af associated with u [154, 149], 
Af xi CT w R is a crossed product von Neumann algebra acting on 1-i <g> /^(R, dt) [150, 158], 
and H is a standard representation Hilbert space of Af [63]. 

The one-parameter automorphism group of F(Af), defined by 

a B {x<j> a ) := e-^xcjj* Vxft* G Af(t), (2) 

extends uniquely to an automorphism group a s : Af — > Af. It allows to define a grade 
grad(T) of a closed densely defined operator T affiliated with Af as such 7 e C that 

a s {T) = e~ 7S T Vs G R. (3) 

If grad(T) = 0, then T is bounded, but if re grad(T) 7^ 0, then T is unbounded. The action 
of a s on Af is integrable over s G R, and I&(x) := j R dsa s (x), x G Af + , is an operator 
valued weight from Af to Af (for details on operator valued weights, see [65, 66, 47]). This 
allows to equip Af with a faithful normal semi-finite trace f, defined by 

f v (x) := lim cp o h(tp- 1/2 (l + 6 V - l )- l ' 2 x V - l / 2 (l + ey?- 1 )- 1 / 2 ). (4) 

This definition is independent of the choice of weight (f v = V^,^ G Wo (A/")), which 
allows to write f instead of f v . Moreover, f has the scaling property 

foa s = e~ s f Vs G R. (5) 

This allows to consider f as a canonical trace of Af. Falcone and Takesaki call the system 
(Af, R, a, t) the non- commutative flow of weights. 



2.2 Non-commutative L P (N) spaces 

Let DJt p (Af) denote the space of all f-measurable operators of grade 1/p affiliated to Af 
(for details on measurable operators, see [141, 108, 152]), and let the set m/. C Af + be 
defined as 

m+ := {x*y G Af | \h(x*x)\ < 00, \h(y*y)\ < 00}. (6) 
Given weight (p G W(Af) and a canonical trace f, the construction 

K:=(^^) VtGM (7) 



Df 
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defines the operator valued maps ip H- h v on W(A/"), called the Haagerup correspon- 
dence [64, 166]. From the Pedersen-Takesaki non-commutative generalisation [115] of 
the Radon-Nikodym theorem it follows that h v is a unique positive self-adjoint operator 
affiliated with Af which satisfies 

¥>oI a (.)=f(V)- (8) 

Moreover, for any closed and densely defined positive operator T affiliated with Af with 
p := regrad(T) > there exists a unique weight ip such that is of grade 1 and = 
|T| 1 / p . Given a polar decomposition of p = u\ip\ as a linear form, the weight ip is finite 
(M(I) < oo) iff h v is f -measurable [108, 64]. Hence, h v can be considered as (a reference- 
independent) 'operator density' of ip. For ip G A/"/, an assignment T(ip) := uT(\ip\) defines 
a unique extension of the Haagerup correspondence map to a natural bijection (linear 
isomorphism) 

AC 3lo^t(uj) em\Af). (9) 

Falcone and Takesaki show that the integral on T, given by 

J T :=f(a l ' 2 Ta 1 / 2 ), T G V&iAf), a G m+ , h(a) = 1, (10) 

is well defined. While f takes only the +oo value on non-zero elements of SOT 1 (A/"), the 
integral J takes finite values. This allows to extend (9) to an isometric isomorphism of 
Banach spaces, with the norm on SOT^AO defined by fT^ := J \T\, and with fT^)^ = 
p(I). The duality pairing between Banach spaces Af and 50T 1 (Af) that identifies Af* with 
SOT 1 (Af) is given by the bilinear form 

Afx SOT 1 (AO 3 (x,T) ^ \x,T\x := JxTeC. (11) 

The non-commutative L p (Af) spaces for p G [l,oo[ are defined as the spaces SOP (AO 
equipped with, and Cauchy complete in, the norm 

||T|| p := ^ \T\ p ^j ' , T G 50P(A0. (12) 

By (9) and (11), L x (Af) = Af*, and it is natural to define L oc (A') = Af = Af(0), using the 
definition (3) of grade with 5" S (T) = T for grad(T) = 0. The space L 2 (Af) is isometrically 
isomorphic to the Hilbert space 7-L of standard representation of Af [63], and the inner 
product on L 2 (Af) defined by 



L 2 (Af) x L 2 (A0 3 (Ti,T 2 ) ^ {T U T 2 ) MU) := J T* 2 T X G C 



(13) 



allows to identify them. For any choice of u £ Wo (A") (or u G Af^), L 2 (Af) is unitarily 
isomorphic to the GNS Hilbert space H^. By definition, all L p (Af) for p G [1, oo] are 
Banach spaces. For any choice of a reference weight if G Wo (Af), they are isometrically 
isomorphic to the non-commutative L p (Af,if) spaces of Kosaki and Terp [89, 153], Araki 
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and Masuda [10, 101], Connes and Hilsum [30, 72], and Haagerup [64, 152]. If the von 
Neumann algebra Af is semi-finite and some faithful normal semi-finite trace on J\f is 
chosen, then the Falcone-Takesaki L p (N) spaces are isometrically isomorphic and order- 
isomorphic to non-commutative L p (Af,r^) spaces developed in [43, 141, 41, 110, 96, 147, 
108, 167]. 

The duality (11) extends to non-commutative L P (N) space duality, given by the bi- 
linear map 

L P {M) x L q {M) 3 (S,T) h- lS,T\fi ■■= J STe C, (14) 

with 1/p + 1/q = 1, where p G {z G C | re (z) > 0}, and L l/it (J\f) = Af(t) for all t G E. 
This way the Falcone-Takesaki theory incorporates also Izumi's [76, 77, 78] complex 
extension of the weight-dependent Kosaki-Terp theory. 

The spaces 9Jt p (AT), p G C \ {0}, embed naturally into the *-algebra Wl(J\f) of all 
f-measurable operators affiliated to N '. Due to the properties of grade function, the 
elements of 9Jt(A0 possess remarkable algebraic properties. The grade function satisfies: 

grad(T*) = (grad(T))*, grad(5T) = grad(S) + grad(T), 

grad(|T|) = re (grad(T)) = §(grad(T) + grad(T)*), ( 15 ) 
re (p) < =>• Lxihf) = {0}, re (grad(T)) > \T\ g Af+, 

V 

where ST is the closure of ST, and the last property is understood via the bijection 
between u> G A/^ + and the corresponding G L\(N). If (j), ip G A/^ + and x,y G A/", 
then the elements xcff and ytp x can be added and multiplied freely inside Wl(Af) for all 
7, A G C such that re (7) > and re (A) > 0. Moreover, for any {T,}" =1 C SQ^A/") such 
thatEigrad(ri) = l, 



Y[Ti G La (AO, J Ti • • • T n = J T n T x ■ ■ ■ T n „ 1; • • • Tj < ■ ■ ■ ||T n || 



(16) 



holds. In particular, one can use the formal algebraic relations of Af(t) to rewrite the 
Tomita-Takesaki modular automorphism as a_ t {x) = AT^xA^ = and to rewrite 

Connes' cocycle as ) = = u lt (j)~ lt , where A^ ^ denotes the Connes-Digernes 

relative modular operator [6, 29, 40]. These remarkable algebraic properties were first 
observed by Woronowicz [165] and were later analysed in details by Yamagami [166]. 



2.3 Comparison with traditional approaches 

With these results at hand, let us provide a brief discussion of the definition of information 
model which was given in Introduction. 

From the mathematical point of view, the setup of py*-algebras and their preduals is 
uniquely selected by three requirements: 

1) the complex non-commutative generalisation of the abstract (Daniell-Stone) inte- 
gration theory on abstract (Riesz) vector lattices should be provided in terms of 
the abstract *-algebras, 
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2) the duality pairing between the *-algebra and the space of integrals on it should be 
a Banach space duality between Banach spaces, with a Banach *-algebra being a 
Banach dual of the Banach space of integrals, 

3) every *-homomorphism of *-algebra should be continuous in terms of Banach space 
norm ||-|| (this is assumed through the condition || 

Thus, as opposed to the usual algebraic approach to foundations of quantum theory, we 
disregard here the foundational character of C*-algebras (c.f. [139]) or Jordan algebras 
(c.f. [46]) and associated interpretation based on spectral properties and commutative 
measure theory in favour of W'-algebras and associated interpretation based on dualistic 
properties and non-commutative integration theory. On the pragmatic level, this choice 
can be motivated by the requirement of providing the unified treatment of commutative 
and non-commutative cases (see below), and by an observation that only H^-algebras 
provide a Banach space duality between states and algebras that allows to introduce 
the duality between Markov maps and coarse-grainings in the infinite-dimensional non- 
commutative case (see next section). 

From the physical point of view, the use of non- commutative *-algebras instead of 
Hilbert spaces in the definition of information model is necessary in order to deal with 
the issue of unitarily inequivalent Hilbert space representations of algebras of operators, 
which is one of the key features of the quantum theory in infinite-dimensional (continu- 
ous) regime. This includes the continuous equilibrium statistical mechanics [46, 18] and 
relativistic quantum field theory [61, 163, 9] as two most prominent examples. Further 
restriction to W-algebras is caused by the fact that C*-algebras are suitable only for de- 
scription of fermionic models, while the description of bosonic models necessarily requires 
to use the H/*-algebras [18, 129]. 

From the statistical point of view, the use of positive parts of preduals of W / *-algebras 
instead of measure spaces (X, 13(X), //) has also an important reason. The traditional ap- 
proach [52, 88] assumes a particular choice of a background measurable space (X, U(X)), 
where U(X) is some particular countably additive algebra of subsets of some 'sample 
space' X (e.g., algebra of Borel subsets of a topological space), and defines a statistical 
or probabilistic model as a subset of the space 

M{X,U{X),fi) C L x {X,U{X)^y 

of all measures that are absolutely continuous with respect to a fixed measure fi. However, 
there exist many different choices of (X, U(X), /i) such that the associated Lx(X, U(X), fi) 
spaces are isomorphic to the same abstract Li(U) space. The abstract Li(U) space 
is functorially associated to a countably additive Dedekind complete boolean (caDcb) 
algebra U, and the latter can be defined in terms of (X, U(X), fi) by [87] 

U := U(X)/{A G U(X) | fi(A) = 0}. 

Hence, the notion of statistical model is in fact independent of the choice of a particular 
sample space X and a particular measure space (X ,U(X), n). It depends only on the 
choice of a particular algebra O. Furthermore, the well-defined statistical models should 
allow to use the Steinhaus-Nikodym isomorphism of Banach spaces 

L^X^iX)^)* = Loo(X, U(X),/j,). 
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This requirement is crucial for considering the elements of A4(X, 15(X), //) as the Radon- 
Nikodym derivatives with respect to measure //, because the latter are elements of 
L 0C (A', 15(X), //). However, this isomorphism does not hold for arbitrary measure spaces 
[137], but only for those (X, U(X), fi) which are localisable, what is equivalent with lo- 
calisability of a pair (O, z/), where v is an arbitrary semi-finite strictly positive measure 
on caDcb algebra O [140]. Thus, the definition of the statistical model which lacks any 
mathematical ambiguity should use the localisable pairs (U, v) instead of measure spaces 
(X ,U(X), n). In consequence, we can restrict the considerations to the maharanisable 
caDcb algebras, defined as such caDcb algebras that allow semi-finite strictly positive 
measures. We will call them camDcb- algebras. 

Every Ty*-algebra is equipped with at least one normal faithful semi-finite weight 
ip [62]. Moreover, every commutative iy*-algebra M is isomorphic to ^(U) space for 
some camDcb- algebra 15. This follows from [136] and the fact that every localisable 
measure space (X, U(X), /x) generates a unique corresponding camDcb-algebra [54, 128] 
(the strictly positive semi-finite measure /ionO can be derived from ip, see e.g. [148]). 
But, in face of the functorial association of L P (U) spaces with O [54], this selects (up 
to isomorphism) a unique corresponding O. In such case, the space Li(Af) of integrals 
on M reduces to the space L\(15). As a result, for commutative iy*-algebras A/", the 
quantum information models M.(M) Q J\[+ turn into subspaces A4(U) C Lx(0) + , which 
are precisely the well-defined statistical models. 

The Falcone-Takesaki theory provides a striking generalisation of the representation 
independent construction of spaces of integrals to the general non- commutative case, in- 
cluding the generalisation of functorial association of measure-independent L p (U) spaces 
with U to functorial association of weight-independent L p (N) spaces with A/", which is 
based on the non-commutative flow of weights. In face of such deep structural relation- 
ships between commutative and non-commutative integration theory it is quite hard to 
find convincing arguments in favour of consideration of statistical theory and quantum 
theory as two separate theories (we do not count here the philosophical prejudices, but 
only the mathematical structure and the range of its applicability in concrete problems). 
From this perspective, relying on spectral theory, which represents quantum theoretic 
models in terms of commutative statistical models via the L 2 (X, U(X), fi) space, cannot 
be longer justified. Quantum theoretic models can be quantitatively constructed and 
analysed as information models on its own right. Note that these conceptual changes 
are consequences of a single mathematical change: replacement of measure theory by 
integration theory. 

Finally, let us note that we disregard the normalisation of states in favour of finite- 
ness, because we consider the notion of information, quantified by finite positive integral, 
as more fundamental than the notion of probability, quantified by normalised measure. 
The quantitative evaluations u 6 A/^ + can be used to store and transform information 
irrespectively of the normalisation. This perspective follows the ideas of Ingarden and 
Urbanik [74, 75], but is caused also by impossibility of maintaining the relative frequency 
interpretation for normalised integrals on non-commutative W / *-algebras beyond the type 
Hi factors, what was observed already by von Neumann [162, 132]. Yet, without relative 
frequencies associated on the level of interpretation, the normalisation condition is just 
irrelevant. Thus, if one decides to consider the non-commutative integration as equally 
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fundamental as the commutative one, there is just no mathematical or conceptual reason 
why one should restrict the information theoretic considerations to normalised integrals 
instead of finite ones. 

3 Information deviations 

Two crucial properties that can be used to characterise the structures of information ge- 
ometry (information deviations, metrics, connections) are the monotonicity under coarse 
graining and the generalised orthogonal decomposition under projection onto convex sub- 
spaces (that is, generalised cosine theorem). These properties encode, respectively, the 
requirements that "the coarse-graining of information should always be indicated by the 
decrease of relative measure of information", and "the restriction of relative measure of 
information to subinformation (submodel) should be expressible in terms of its additive 
decomposition". 

In this section we will consider the families of deviations that are defined separately by 
each of these conditions. The information deviations that satisfy both of these conditions 
will be considered in Section 4. 

3.1 Markov monotone deviations 

In the commutative normalised case the coarse graining T c is defined in terms of the 
Markov map TbyT c :V3u^ujoT&V such that, given uj : L^U, /i;C) 3 f i— > 
f fif G C, T : Loo(0,yU;C) — > L 0O (0 / , //; C) is a normal (monotonically continuous) 
positive linear map satisfying T(l) = 1. In the non-commutative case the Markov maps 
are defined by normal unital completely positive linear maps T : M' — > M between W*- 
algebras AC, A/ 7 [146]. In the finite-dimensional normalised non-commutative case it is 
known (see e.g. [106, 122]) that there is a duality between Markov maps on W / *-algebras 
and coarse-grainings defined as normal trace-preserving completely positive maps on the 
corresponding spaces of density operators. However, this duality depends on a choice of 
underlying Hilbert space (or some reference state or weight), and the same is true for 
various infinite-dimensional generalisations of it. 

Using the Falcone-Takesaki theory [48] and the results of Petz [117] we can show 
that this duality holds in the general non-commutative case without any additional as- 
sumptions. Consider a pair (Af,<p), where M is a von Neumann algebra and <p G AC- 
Define contraction as a morphism of pairs (AC <p) — > (A/"', <f)') given by the linear map 
T : AC A/ 7 such that 0<x<1^0< T(x) < I, y G AC+ (f>'(T(y)) < (f)(y), and T 
is weakly continuous. The pairs (A/", </>) and contractions between them define a category 
Cont. For any T G Mor(Cont) we define a dual map of T as such T c : A^ —> A/"* that 

[T(x), = [x, T c (0')k V * e M V0' G AC, (17) 

where [•, is a Falcone-Takesaki duality (11). The map T c is a normal positive con- 
traction, hence (-) c : Cont — > Cont is a contravariant functor such that (-) cc = idc on t- 
Moreover, following the proof in [117], it is easy to prove that T is unital and completely 
positive iff T c is trace-preserving and completely positive. Hence, we can replace the 
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discussion in terms of the Markov maps (communication channels) by the discussion in 
terms of the dual Markov maps (coarse-grainings). 

The condition of monotonicity under coarse graining imposed on the information 
deviation D reads D(u, 0) > D(u o T, o T), or equivalently, 

D{u,(j>)>D{T c {u),T c {4>)) (18) 

for all 6 M and for all coarse-grainings T c . As remarked in introduction, in com- 
mutative finite-dimensional case (and under some mild auxiliary conditions) (18) char- 
acterises [36] the class of Csiszar-Morimoto [34, 35, 104] deviations. Their direct gener- 
alisation to general non-commutative case is provided by the Petz deviations [119, 120] 

D f (u,(f>) := (£,,f(A^)Q w , (19) 

where f : [0, oof— > M is a continuous operator monotone function with f(0) > (f is 
called operator monotone iff0<x<?/=^0< f(x) < f(y) for any bounded operators 
x, y G 53 ("H)). The Hilbert space "H is a standard representation of the W* -algebra J\f, 
and £ w is the unique representative of u in the natural positive cone of the standard 
representation Hilbert space %. All Petz deviations (19) satisfy (18), but it is not known 
[126] whether these are the only deviations on which have this property. 

3.2 Generalised Bregman deviations 

While the monotonicity under coarse graining requires to introduce the framework of 
Markov maps and the Falcone-Takesaki duality, the generalised orthogonal decomposition 
requires to introduce the framework of non-smooth variational analysis and the Legendre- 
Fenchel duality (for an exposition of the latter, see [51, 134, 17, 16]). 

The existence of generalised orthogonal decompositions is involved in the representa- 
tion properties of M.. Let L and L d be two linear (vector) spaces over IR (or C), equipped 
with a bilinear duality pairing [•, -] L : L x L d — > R (or C). Let ^ : L — > [—00, +00] and 
let dom\l> := {x G L \ \l>(x) < +00}. The Fenchel subdifferential [51, 102, 21] of \1> at 
x G dom^ is defined as a set 

dV(x) := {y G L d I V(z) -V(x) > lz-x,y] L Vz G L}. (20) 

The Legendre-Fenchel dual of \l/ is defined as \E fL : L d — > [—00, +00] such that 

V L (y) :=mv{[x,y] L -*(x)} Vy G L d . (21) 

One defines \I/ LL : L -> [-00, +00] by \[> LL := (\1/ L ) L . The functions \[> L and \1> LL are 
convex for any ^, and \[> LL < \[>. If dom\l> ^ 0, then \& L (x) > -00 Vx G L d . If (L,L d ) 
are separated locally convex vector spaces, equipped with weak-* and weak topologies, 
respectively, then ^ is weak-* lower semi-continuous, \E fLL is weak lower semi-continuous, 
and (^P LL = \1> iff \^ is weak lower semi-continuous and convex) [20]. In what follows, we 
will always assume dom\l/ ^0. If \& : L — 7-IRU{+oo}is convex and y G L d , then the 
Young-Fenchel inequality 

*(x) + y L (y)-lx,y] L >0 (22) 
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holds, with equality iff y G d^(x) and x G dom\l/. If L and L d are over C, then [x,2/] L 
in (21) and (22) is replaced by re [[x,?/] L . 

This allows to define the Bregman functional for a convex : L — ^ RU{+oo} as a 
map 

L x L d 3 (x,y) m- D*(x,y) := + V L (y) - [x,yj L G [0,+oo], (23) 

if L is over M, and with [a;, replaced by re [x,?/] L if L is over C. By definition, Dq, 
is convex in each variable separately, Dy(x,y) > W(x,y) G L x L d , and Dy(x,y) = 
•<=>- y G 9\E'(x). We will call Bregman functional adequate for V C L iff 
9\E'(x) 7^ Vr G dom^ fl V. There exist various criteria for non-emptiness of Fenchel 
subdifferential which can be used to ensure adequacy of Dq,. In particular, if (L, L d ) are 
Banach spaces, and \I/ is convex and lower semi-continuous, then the Fenchel-Rockafellar 
theorem [134, 17] states that d^(x) ^ Vr G core(dom^), where x G core(X) for 
^ X C L iff V/i G 3e: > Vt G [0, e] x + th e X, with S L denoting a unit sphere 
in L. 

Define dual coordinate system (or efata/ representation) as a map (r, s) : .M x 3 
(u, 0) i — y (r(o;), s(0)) G LxL A . We define the generalised Bregman deviation as a function 

D$:MxM9(u^)^ £>*(u;, 0) := D»(r(w), s(0)) G [0, +oo], (24) 

where (r,s) is a dual coordinate system such that r(u) G d^/(r(u)) Vw G while Dq, 
is a Bregman functional adequate for cod(r). By definition, D^(u,<p) is convex in each 
variable separately, D^(lu, 0) > Va;, G A1, and a; = =>- D 9 (u, 0) = Va> G A4. This 
weakening of the usual property of deviation (a> = •<=>- D(u,<f)) = 0) to one-sided 
implication is the price paid for defining Dq, in terms of a weakly constrained (hence, quite 
general) variational problem. In order to impose an implication in opposite direction, one 
would have to require additional conditions that are not natural at this level of generality 
(they will be discussed below). Note also that due to adequacy of Dq,, the condition 
s(u) G dty(r(oj)) is well defined for all u G Ai. It can be understood either as a condition 
on allowed dual coordinate systems if \1/ is given, or as a condition on ^ if (r, s) is given. 
To summarise, an information deviation D : Ai x Ai — > [0, oo] is a generalised Bregman 
deviation, denoted Dq,, iff there exists a pair of dualised vector spaces (L,L d , [-, -] L ), a 
pair of functions r : At — > L and s : — > L d and a convex function $ :L->lU {+°°} 
satisfying the conditions 

d$(x) ^ Vx G dom* n codr, s(o;) G <9^(r(u;)) Vw G M, (25) 

and such that 

D(u, 0) = sup{[a;, s(0)] L - - [r(w), s(0)] L - *(r(w)), (26) 

with [•, -Jjr replaced by re if L is over C. Note that one could also define the 

functional 

D 9 (u, 0) = sup {[r(w), s(0)] L - *(r(w))} - [r(w), - *(r (<")), (27) 

but as long as cod(r) C dom\ETlL, (27) is essentially different from (26), due to non-linear 
character of the sup operation. 
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3.3 Dualisers of representation 

The generalised Bregman deviation (26) exposes the underlying dualistic and variational 
properties of the deviation functional. However, often the other representation of Breg- 
man deviation is used, which exposes its geometric properties at the price of the non- 
trivial restrictions on the domain of duality and convexity. Usually these restrictions are 
imposed in order to adapt to presupposed topological framework and some presupposed 
form of representation of Bregman deviation, and the goal is to show that such Bregman 
deviation encodes the Legendre case of the Legendre-Fenchel duality with the dual vari- 
able y G L d given by some suitably defined notion of derivative (e.g. Frechet, Gateaux, 
one-sided Gateaux), see e.g. [24, 12, 22, 16] for standard examples in commutative case 
and [125] for an example in finite-dimensional non-commutative case. Our approach is 
different, because we do not assume any fixed framework for continuity or smoothness, 
so we can analyse the general relationship between explicitly dual (generalised) Bregman 
deviation and its standard (hence, restricted) version, which has both arguments repre- 
sented on the same space. We will construct now a general framework for conversion 
between these two forms of the Bregman deviation, which is independent of any partic- 
ular assumptions about continuity or differentiability. The key role in this setting will 
be played by the (generally, non-linear) dualiser function. It is an infinite-dimensional 
generalisation of the transformation between the dual coordinate systems of Bregman 
functional, which in the finite-dimensional case is usually encoded in terms of derivative. 

For a convex ^ : L — > R U {+00} define a function fq,:L-^L d such that there exists 
a set 7^ V C L satisfying: 

(If) /* : V — > /*(V) is an isomorphism, 

(2 f ) \jyev v L (h(v)) - *(v) = [y,/*(y)] z , 

(3/) Vx G dom^ ny 3\z G f*(V) z G d$(x). 

Such function /* will be called a dualiser, while the set V will be denoted adm/* and 
called and admissible domain of /*. We will consider only on the set adm/*, so we 
define cod/* := /*(adrnf*). The function \1/ will be called dualisable with respect to 
(L,L d , [•, -] L ) iff there exists at least one dualiser /*. Note that dualisability of \I/ is a 
relative property: a change of domain L or a change of a duality structure on L 

changes the available dualisers. Note also that the condition (3/) can be equivalently 
written as 

cod/* n 9*|adm/*nctom*(ac) = {*}> (28) 

where {*} is a singleton. For convenience of notation, till the end of this section we will 
assume that L is over C (if it is over R, then one has just to drop 're ' everywhere). 

If Dq, is a Bregman functional and \I/ is dualisable with a dualiser /*, then the 
unbounded standard Bregman functional is defined as a map 

Dy : L x adm/* 3 (x,y) H> D^(x,y) := #(x) - - re (x - y, fa(x)} L G R U {+00}, 

(29) 

while the bounded standard Bregman functional is defined as a map 

: dom^ x adm/vp 3 (x,y) h-> Dy(x,y) := ^{x) - re {x - y, fe(x)] L G R. (30) 
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By definition, Dq, satisfies 

i) Dy(x,y) = Di S (x,fi S (y)), 

ii) Dq,(x,y) > 0, 

iii) D 9 (x,y) = <^ x = y, 

for all (x,y) G L x adm/$ (or for all G dom\l/ x adm/^, if D$ is bounded). The 

two-sided implication appears here at the price of loss of convexity of in the second 
variable (what is a common problem in standard treatments, see e.g. [13]). This is 
because (the inverse of) a dualiser may not preserve the convexity properties. 
The (standard) Bregman deviation is defined correspondingly as 

D 9 (u, 0) := D 9 (r(u), f* l os{<t>)) = *(r(a;))-*(^ 1 o5(0))-re [[r(co) - f^ 1 o s(<j>), s{<f>)]] , 

(31) 

where (r,s) : Aix M. — > LxL d is a dual coordinate system such that s(u) G d^f(r(oj)) Vw G 
.M. By definition, Dq,(u, 0) = Dy(u), 0), where Dq, is a Bregman functional adequate for 
cod(r). It follows that a single generalised Bregman deviation can have several different 
representations in terms of standard Bregman deviations, depending on the choice of 
dualiser f^:L^ adm/^ — > cod/^ C L d . If ^ and Dyj 2 are two standard Bregman 
functionals defined from a single generalised Bregman functional by two dualisers 
fi and f 2 , then they are equal to each other on V C adm/i D adm/2 iff there exists a 
dualiser ^3 of \I/ such that adm/3 = V. The existence of different dualisers is equivalent 
to <9\I/(r(-)) being a non- singleton, non-empty, set-valued function. From the geometric 
perspective, the choice of a particular dualiser (and an associated representation of gener- 
alised Bregman deviation in terms of standard Bregman deviation) corresponds to choice 
of a particular section of the presheaf M. 3 u h-» d^(r(u)) C L d . 

The definition of bounded standard Bregman functional implies that satisfies 
the generalised cosine equation 

D^(ri,r 2 )+D^(r 2 ,r 3 )-D^(r h r 3 ) = re [n - r 2 , fy(r 3 ) - U(r 2 )] L Wi, r 2 ,r 3 G adm/*ndom\I>. 

(32) 

Under suitable assumptions that guarantee existence and uniqueness of the solution of 
the corresponding variational problem, this equation can be turned into a theorem on 
existence and uniqueness of generalised additive decomposition of information deviation 
under projection onto subspace (submodel). 

Let y G adm/^ D dom\l/, let R C adm/^ D dom^ be non-empty and convex, and let 
r G R. The element x G R will be called the Bregman projection of y on R, and denoted 

PIG/), iff 

x = arg inf {Dy(r, y)}. (33) 

reR 

The problem with this definition is that in general case P^(y) might not exist or might 
be non-unique. The existence and uniqueness of Bregman projection can be guaranteed 
under various assumptions. In particular, the existence can be guaranteed by means of 
Bauer's theorem [11] (if L is a locally convex space, R is weakly compact, and D^, is 
weakly lower semi-continuous). On the other hand, if L is a reflexive Banach space, R 
is closed, Dq, is lower semi-continuous, strictly convex, and Gateaux differentiable at y, 



15 



with int domD^ 7^ 0, R D domZ)^ and y G int donxD^, then P^(y) is at most a singleton 
[16]. Unfortunately, we were unable to find the sufficient conditions for uniqueness and 
existence of the Bregman projections that would be phrased without appealing to par- 
ticular topological framework. Expressing such conditions in general terms of dualisers 
would be a valuable result. 

Let (L, L d ) be a dual pair of separated locally convex spaces, let R C adm/^ PI dom\l/ 
be a non-empty convex, affine, weakly closed set, and let x := P^(y) be a unique Bregman 
projection of y G adrnf^ PI dom\l/. Then the generalised pythagorean theorem 

D*(r,x) + Dy(x,y) = D 9 {r,y) Vr G R (34) 

holds iff the orthogonality condition re [r — x, f^(y) — f^{x)J L = is satisfied. Such x = 
P]i{y) is called a (dually) orthogonal Bregman projection. The property (34) generalises 
the additive decompositions of norm under linear projections on closed convex subsets 
in the Hilbert space to the class of non-linear Bregman projections on closed convex 
subsets in the linear space L. Note that the 'orthogonality' of projection is understood 
in the sense of the linear duality pairing [•, while the non-linearity of projection P^ 
corresponds to the non-linear dualiser 



4 The 7-family of relative entropies 
4.1 Families of 7-deviations 

By imposing monotonicity under coarse-graining condition on the generalised Bregman 
deviation (or on a corresponding standard Bregman deviation) one obtains strong re- 
striction on the allowed form of the deviation and corresponding dual coordinate systems 
(representations). For the finite-dimensional commutative case, Amari [4] has shown 
that the unique deviation that satisfies these constraints is the Zhu-Rohwer 7-deviation 
[171, 172, 173] 

7/^ + (1 - 7)1^ - 



=7 V-7) ■ (35) 

where [a w and are the finite positive measures corresponding to the Daniell-Stone inte- 
grals u> and 0, while /i^^ 7 = (^") 7 * s J us ^ the 7-th power of the Radon-Nikodym deriva- 
tive ^f-. The most general deviation satisfying both constraints in the non-commutative 
case that has been known so far is the Jencova-Ojima 7-deviation [81, 84, 112] 

n , , 7^(1) + (1-7)W -re [[nA^Ag]^ 
D>, 0) = W^l) ' 

where ip G Wo (A/") is an arbitrary reference functional, [■, -]L is the Banach space dual- 
ity pairing between the Araki-Masuda non-commutative Ly^iAf, ip) and Li/n_ 7 )(A/','0) 
spaces, is a relative modular operator, while u (resp., v) is an operator arising 

from the polar decomposition of oj (resp., (ft), u(xu) = [mA^^,^]]^ Vx G A", where x d 
denotes a dual of x with respect to the Araki-Masuda duality [•, •] 
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The deviations given by (35) and (36) are well defined for 7 6 E \ {0, 1}, however 
the characterisation results include also the boundary case 7 G {0, 1}, for which the 
corresponding deviations are derived from D 1 by passing to the limit with 7 under integral 
sign. If the normalisation condition is imposed, then the uniqueness results are stronger, 
selecting precisely the boundary functionals: the Kullback-Leibler [95, 94] deviation 

D (u,<j>) = Di (0, u) = f log ^ (37) 

in the finite-dimensional commutative case [37], and the Umegaki-Araki [157, 7, 8] devi- 
ation 

D (uj,4>) = D x {(j),u) = tr(p (logp^ - logpj) = tr (pJ/ 2 (log A^JpJ/ 2 ) (38) 

in the finite-dimensional non-commutative case [125]. 

Despite these similarities, the Jencova-Ojima deviation is not a canonical non-commu- 
tative generalisation of the Zhu-Rohwer deviation. The construction of the former is 
dependent on the choice of fixed reference weight ip, while the latter does not depend on 
any additional measure. (Nevertheless, the values taken by the Jencova-Ojima deviation 
are independent of the choice of reference ip.) In what follows, we will use the Falcone- 
Takesaki theory in order to introduce a new family of information deviations that is a 
proper non-commutative generalisation of the Zhu-Rohwer deviations and is also a proper 
generalisation of the Jencova-Ojima deviations. 

4.2 General form of family of 7-deviations 

Consider the 7- embedding (7- coordinate) functions on A/"+ valued in L 1 / 7 (7V) spaces: 

£ 7 : Af+ 3 lo i-> £ 7 (u) := — G L lh (M), (39) 
with 7 g]0, 1]. By definition, the embeddings i 1 and £i_ 7 encode the duality (14), 

K + x K + 3 0) h. J £»A- 7 (0) e C. (40) 

The special case of these maps were introduced by Nagaoka and Amari [107] as coordi- 
nate systems in normalised commutative finite-dimensional setting, and since then they 
became a standard tool of information geometry theory. However, the Nagaoka-Amari 
formulation, as well as all its later applications, including the non-commutative gener- 
alisations, is based on using the 7-powers of densities (Radon-Nikodym derivatives or 
Pedersen-Takesaki operator densities) with respect to a fixed reference measure, state, or 
weight. An important attempt to circumvent this problem in the commutative case was 
made by Zhu [170, 173], who considered the spaces of measures constructed through an 
equivalence relation based on 7-powers of Radon-Nikodym derivatives, but without fix- 
ing any particular reference measure. However, his work remained unfinished and widely 
unknown, and it covered only the commutative case. The embeddings (39) solve these 
problems. 



17 



Due to the Falcone-Takesaki theory we are able to make the reference-independent 
approach strict and valid in all cases, including the infinite-dimensional non-commutative 
one. Let us define [91] the general form of quantum 7- deviation on A/"+, D 1 : A/"* + x J\[+ 3 
(co, 0) ^ D 7 (cj, 0) G E, by 

== / + * - « fcHW*)>) - » / ( ^^ ^ 

(41) 

where 7 g]0, 1[. This definition can be extended to include the boundary values 7 G {0, 1}, 
by 

D 1 { ( f>,u) :=D {u,<f>) := / lim f + ^ - re (€ 7 (w)^_ 7 (0)) J . (42) 

J 7^0 V 1 - 7 7 / 

The deviation D 7 reduces to the following special cases: 

• the Jencova-Ojima 7-deviation (36) when the choice of a reference state ip G A/*q, 
or a reference weight ip G Wo (A/"), for the isometric isomorphism with the Araki- 
Masuda L p {M,ip) space is provided, 

• the Umegaki-Araki deviation (38) in the representation-independent Petz's form 
[118] 

D {u, 4>) = DM, u) = i lim ^ ( - 1) , (43) 

for 1011 = = 1, supp(w) > supp(0), and 7 — > limit, where supp(w) denotes the 
support projection of u, 

• the Hasegawa 7-deviation [68] 

*l>( pw- pZr 1 '"' ' 

7(1 - 7) 

for semi-finite A/", and and p^ defined as, respectively, the Pedersen-Takesaki 
densities of normalised states <fi and u with respect to a faithful normal semi-finite 
trace ip that acts on a standard representation of M = L 00 (Af) on H = L%(N). 
In this case the Pedersen-Takesaki densities reduce to the Dye-Segal densities [43, 
141], hence </>(■) = ^(p^ •) and u(-) = ip(p u •). If the standard representation of M 
on H is isomorphic to 53(H) as a von Neumann algebra and trace t/> is normalised 
(■0(1) — 1); then ip(-) = tr(-), where tr is a standard trace on 53(H), 

the Zhu-Rohwer 7-deviation (35) for commutative A/", and z/^, absolutely continuous 
with respect to p w , 

the Kullback-Leibler deviation (37) for commutative A/", ^ absolutely continuous 
with respect to p u , J p^ = 1 = j i/^, and 7 — > limit, 

the Cressie-Read-Amari 7-deviation 3 [32, 3, 100, 131] 

D 7 (0J, 0) = ^-j-^ f v( Pu} - pZq 1 ^), (45) 



D^lo, <fi) — -—777— TV — ■> (44) 



3 The 7-deviation family (45) corresponds bijectively, but is not equal, to the 7-deviation families 
of Chernoff [27], Renyi [133], Perez [116], Havrda-Charvat [71], and Tsallis [155] (for a review with 
calculations, see e.g. [33]). 
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for semi-finite commutative Af, and normalised measures v$ and fi w (J [L w = 1 = 
J v$) that are absolutely continuous with respect to a strictly positive semi-finite 
measure v on the countably additive Dedekind complete boolean algebra U derived 
from M . The functions and are defined, respectively, as the Radon-Nikodym 
derivatives of v$ and fi^ with respect to v. <$(■) = J z^(-) = / t> •), arid w(-) = 
JM") = fv(p u ■)• 

It is quite remarkable that the mathematical form of 7-deviation given by (41) resembles 
so strong similarity with its commutative special case (35). 

4.3 Properties of 7-deviations 

Let us turn to the discussion of the properties of the family (41). A direct calculation 
shows that (41) is a generalised Bregman deviation (26) with 

and \E f 7 (£ 7 (u;)) = j^-w(I). This deviation has a dualiser f\g y (x) = £i_ 7 £~ 1 (x), which is 
a homeomorphism : L 1 / 7 (A/') — > L 1 /( 1 _ 7 )(A/'). Thus, .D 7 can be written in terms of 
the generalised Bregman deviation (26), which takes the form 

= %{£ 7 {co)) + ¥ 1 _ 7 (4- 7 (0)) - re [l»A_ 7 (0)]^. (47) 

The straightforward calculation based on the definitions (41) and (39) shows that 
satisfies the generalised cosine equation 

D 7 (w, <P) + L> 7 (0, ^) = L> 7 (u;, ^) + re / (£» - £ 7 (</>)) (£i_ 7 (</0 - f-i-M) > ( 4 §) 
as well as 

J D 7 ( W ,0) = J D 1 _ 7 (0,u;). (49) 

The equation (48) turns into the 'ordinary' generalised cosine equation (32) for the cor- 
responding standard Bregman deviation (47), while the equation (49) turns into the 
'represent at ion- index duality' equation 

D^(y,x) =A^ (/*>), (50) 

where x, y G L\/^{M). The finite-dimensional commutative version of the equation (50), 
with a dualiser given by derivative, was discussed in [168]. 

Furthermore, from the uniform convexity and the uniform smoothness of L\/ 1 {M) 
spaces it follows that the norm is Frechet differentiable at all points x G Li/ 7 (jV r ) 

except x = 0. The Frechet derivative of the norm allows to calculate the corresponding 
Frechet derivative of the function ^/ 7 (x) in the direction y, which reads 

J) F {%{x)){y) = re = « [[vJi-y ■ ( 51 ) 

For 7 < 1 the function \l/ 7 (x) is Frechet differentiable also at x — 0, with 

® F (%(0))(y) = re {y,h(0)j M = 0. (52) 
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This gives a differential formula for the standard Bregman functional D^ J , 

D^(y, x) = %(y) + *i_ 7 (4- 7 o q\ x )) - ® F (%(x))(y). (53) 

The uniform smoothness and uniform convexity of L p (N) allows to strictly follow Jencova's 
proof of Proposition 6. 1. (i)-(ii) in [84], what results in a proof that, for every (j) G A/"/, 

1) D^u,^) > 0, 

2) £> 7 (u;,0) = u = <p. 

As stated above, for any choice of a reference weight ip G Wo(AT) ? -D 7 turns into the 
Jencova-Ojima deviation. The latter is in turn the particular case of Petz's deviation 
(19) with the operator convex function f given by 

7 1-7 7(1-7) 

see [84] for a proof. The identification of D 1 as a member of Petz's family with the 
function (54) can be also obtained more directly, without any reference to the Araki- 
Masuda Lp(J\f,ip) spaces. This is provided using the identification (13) of /^(A/") with 
the standard representation Hilbert space H, which gives 



/ 




where £^ is a representative of <fi in the standard cone L 2 {M) + of the standard repre- 
sentation Hilbert space %. The last equality follows from the fact that agrees 

with A^^^ for z G K [151]. From the fact that D y is a Petz deviation it follows that it 
satisfies the following properties [119, 84]: 

1) D 7 (w, 0) > ,D 7 (T C o W ,T c o W ), 

2) Dry is jointly convex on A/j 1 " x A/"+, 

3) £> 7 is lower semi-continuous on A^ + xjV^ endowed with the product of norm topolo- 
gies. 

Using the lower semi-continuity and convexity properties of _D 7 together with the fact 
that L\iJN) spaces are Banach spaces, one can take the advantage of the theorems 
on existence and uniqueness of the solutions of the variational problems for the convex 
lower semi-continuous functionals in the Banach space setting. In particular, if ip G A^ + , 
C C Af^~, <f) G C, and £ 7 (C) is a convex set, then the following conditions are equivalent: 

i) D 7 (0,V) = mf veC {Dy(<p,il>)}, 

ii) D^(u, ip) > D y ((f),il)) + Di- y ((j),u}). 
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Moreover, if such Pq 1 ^) = exists, then it is unique. This condition is satisfied if C 
is closed in the topology induced by the 7-embedding of the weak topology of ftf+ in 
L p (Af). The explicit proof of these statements can be obtained by repeating the proofs 
of Propositions 7.1, 8.1 and 8.2 of [84] with the obvious replacement of Ly y (J\f, ip) by 
L lh {N). 

The 7-family of deviation functionals defined by (41) provides a proper infinite- 
dimensional non-commutative generalisation of the 7-families of Zhu-Rohwer deviations 
and Jencova-Ojima deviations that is defined directly in terms of canonical non-commu- 
tative Li/ 7 (A/") spaces and belongs to an intersection of the generalised Bregman de- 
viations and Petz's deviations. The canonical character of the structures employed in 
the construction of D 1 , and the recent result of Amari [4] for Zhu-Rohwer deviation in 
finite-dimensional commutative case, leads us to state the following: 

Conjecture £. The family D 7 (co>,0), 7 G [0, 1], defined by (41) and (42) is the unique 
family of deviation functionals on A/+ satisfying (18) and (26). 

Further properties of .D 7 family, the issue of selection of a particular 7 e [0, 1], as well as 
the extension of other structures of quantum information geometry in infinite-dimensional 
non-commutative case based on Falcone-Takesaki theory will be the main topic of the 
next paper [90]. 

As a final remark, let us note that for 7 = 1/2 the Li/ 7 (jV) space turns into a Hilbert 
space H, while the standard Bregman functional Z) 7 associated with D 7 turns into the 
norm distance 

1 2 

At 1/2 0z,y) = ^\\x-y\\ H . (56) 

Hence, the Bregman projection of information geometric theory based on 7-deviations 
(41) turns in this particular case to the Hilbert space norm minimisation. This shows 
that the framework of Hilbert space geometry is just a special case of much more general 
framework of information geometry of spaces of non-commutative integrals. From this 
perspective, the traditional application of spectral theory to quantitative construction and 
analysis of quantum theoretic models is just a special case of relative entropic modelling. 
In consequence [91, 90], it is possible to consider information geometry of spaces of non- 
commutative integrals as a new framework for kinematics of quantum information theory 
(in particular) and quantum theory (in general), replacing the traditional and semi- 
spectral frameworks based on Hilbert spaces and spectral theory [161, 39] as well as the 
algebraic frameworks based on algebras of operators and spectral theory [139, 46, 61, 31]. 
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